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3D SCANNING USING SHADOWS 

Cross Reference To Related Applications 
This application claims the benefit of the U.S. Provisional 
Applications No. 60/169,103, filed 12/6/99; 60/169,102, filed 
12/9/99; and 60/183,172, filed on 02/17/00. 

Background 

Three-dimensional scanning has recently become popular. 
However, many of the scanners used for such scanning trade-off 
between cost of acquisition and/or use, accuracy, ease-of-use 
and/or speed of acquisitions. Many commercial 3-D scanners 
emphasize the accuracy over all other parameters. 

Scanners such as the Cyberware scanner may use an active 
illumination system in a fixed installation with controlled 
lighting. The object is transported using a motorized transport 
system. This can significantly increase the cost of the system. 
Moreover, this can make the system difficult to use in many 
installations, such as outdoors, where lighting may be difficult 
to control. 

Summary 

The present application teaches a system which allows 
scanning of 3D objects using shadow imaging. 



Attorney Docket No. 06618/565001 
In an embodiment, the system requires only a computer and a 

camera. The computer may be programmed to image shadows, and 
determine three dimensional information of an object from the 
imaged shadows . 

Brief Description of the Drawings 
These and other aspects will now be described in detail 
with reference to the accompanying drawings, wherein: 

Figure 1 shows a general set up showing three-dimensional 
objects and the shadow cast on those objects; 

Figure 2 shows the geometric principle of the technique 
used to recover three-dimensional information; 

Figure 3 shows a flowchart of operation, showing the 
operations that the computer carries out to recover this three- 
dimensional information; 

Figure 4A and 4B show camera calibration; 
Figure 5 shows a three dimensional calibration; 
Figure 6A, 6B and 6C show using a pencil of known height to 
calibrate a position of the light source; 

figure 7 shows a summary of global geometry for a system 
with dual background planes and uncalibrated light source; 

figure 8 shows a summary of global geometry with a single 
background planes and uncalibrated light source; 

figure 9 shows scanning using a reference plane; 
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figure 10 shows depth propagation along multiple edges; 

figure 11 shows a flowchart of operation of multiple images 

being obtained; 

figure 12 shows the scanning set up with a light source and 
a straight edge at dual positions; 

figure 13 shows will coordinate transformation for the 
system; and 

figure 14 shows double-edged intersection using dual light 
sources . 

Detailed Description 
The present application describes capturing three- 
dimensional surfaces using a simplified system. In one 
embodiment, the lighting can be "weakly structured", i.e, any 
light source can be used. 

The general principle is shown in Figure 1. The scene to 
be imaged 100 is shown with a shadow 105 on the scene. The 
shadow 105 is produced by an edge, e.g. a straight edge such as 
a pencil or a stick, and is caused to move across the scene 100. 
The human operator produces the movement. 

A camera 110 monitors the image of the scene including the 
shadows, and produces an output which is sent to a processor 
120. The processor estimates the three-dimensional shape of the 
scene from the sequence of images of the shadow as deformed by 
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three-dimensional elements of the scene. The processing 

operates to extract scene depth at least at a plurality of 
pixels in the image, more preferably at every pixel in the 
image. It should be understood, however, that the pixels could 
be of any size. 

More detail about the acquisition, including the geometry 
of the acquisition, is shown in Figure 2. A light source, shown 
approximated as point source 200, illuminates the scene 199. 
The intersection of the point source and the leading edge of the 
pencil define a plane A(t), at each time instant. Therefore, 
the boundary of the pencil's shadow on the scene comprises the 
intersection between the plane A(t), and the surface of the 
object . 

In Figure 2, the plane n h represents the horizontal plane, 
and the plane n v represents a vertical plane orthogonal to n h . 
This vertical plane is optional. In the Figure 1 setup for 
example, the plane nv is removed. The position of the plane n h 
in the camera reference plane will be known from calibration as 
explained further herein. The position of n v is also inferred 
from a projection A T at the intersection line between IT h and n v . 

The end goal is to obtain three-dimensional coordinates of 
each point P in the three-dimensional scene. This may be done 
one pixel at a time. The three-dimensional location of the point 
P in space will be estimated corresponding to every pixel p at 
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coordinates x c in the image obtained by the camera. Effectively, 

the user projects a succession of N shadows on the scene, 

thereby generating N shadow edges where 1=1. . . N. The 

points on this edge are estimated through triangulation, as 

described herein. 

The shadow time t is defined as the time when the shadow 
boundary passes by a given pixel along a line x c . n t represents 
the corresponding shadow plane at that time t. The leading edge 
of the shadow is represented by the value A. 

The projection of A on the image plane A is obtained by the 
camera at 305. This is done by a temporal processing operation 
of estimating the shadow time t s (x c ) at each pixel x c . This 
temporal processing is followed by a spatial processing, in 
which the projection of A that is seen by the camera is 
represented by values A. Therefore, the two portions of the 
shadow projected on the two planes n h and n v are visible on the 
image as the lines Ah and A v . After extracting these two lines, 
the location in space of the two corresponding lines A h and A v 
are obtained at 310. This can be done by forming a plane 
between the camera and the A lines. In Figure 2, the camera 
origin is shown as 0 C . A plane between 0 C and the A is can be 
obtained. These planes (0 C , A(t), and (O c , A(t)) are intersected 
with the image planes n h and n v respectively. The shadow plane n t 
is then found at 315. If the vertical plane has been used for 
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scanning, then the shadow plane is the plane defined by the two 

non collinear lines A h and A v at 315. <If the vertical plane has 

not been used for scanning, however, the shadow plane may still 

be inferred from the only available line A H and the position of 

the point light source S. in this latter case, point light 

source needs to be at a known and fixed location in space. 

At 320, the actual point P corresponding to the pixel x c is 

retrieved by triangulating II of T with the optical ray O c from 

the camera. 

If the light source S is at a known location in space, then 
the shadow plane II (t) may be directly inferred from the point S 
and the line Ah. Consequently, no calculation of the additional 
plane II (t) is not required. Therefore, two different 
embodiments are contemplated: a first having two calibrated 
planes (II h and n v ) and an uncalibrated light source, and the 
second having one calibrated plane and a calibrated light 
source . 

Camera Calibration 

Camera calibration can be used to recover the location of 
the ground plane n h and the intrinsic camera parameters. The 
intrinsic camera parameters may include focal length, principal 
point, and radial distortion factor. A first embodiment of the 
calibration technique herein uses a planar checkerboard pattern. 
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Figures 4A and 4B show the setup which includes a planar 

checkerboard pattern 400 placed on the imaging area, e.g., the 
ground or table surface, in a general location of the objects to 
be scanned. The camera 4 02 is pointed at this general area. 
The camera 4 02 captures an image shown in Figure 4B. This image 
is used to infer the intrinsic and extrinsic parameters of the 
camera . 

The projections on the image planes of the known grid 
corners are matched to the expected projection that is directly 
on the image. This is described in more detail in Tsai "A 
Versatile Camera Calibration Technique for High Accuracy 3-D 
Machine Vision Metrology using off-the-shelf TV cameras and 
Lenses". A first order symmetric radial distortion model is 
used for the lens. When this technique is used with a single 
image, the principal point, that is the intersection of the 
optical axis with the image plane, cannot be easily recovered. 
This principal point is therefore assumed to be identical with 
the image center. A full camera model may be used which 
integrates multiple images of the planar grid at different 
locations in space with different orientations, e.g., using 
three or four images. 

It has been found that the quality of reconstruction is 
relatively insensitive to errors in the principal point 
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position. Therefore, either of the calibration techniques can 

be used. 

As described above, when a single reference plane, e.g. n h , 
is used for scanning, the location of the light source must be 
known in order to infer the shadow plane location n t . Another 
embodiment allows calibration in the light source using an item, 
e.g. a pencil or a stake of known length. The operation and 
geometric theory is shown in Figures 6A-6C. In Figure 6A, the 
pencil is shown standing on the reference plane n h . The camera 
605 observes the shadow of the pencil 610 as projected on the 
ground plane. The acquired image is shown in Figure 6B. Two 
points: b and t s are found in this acquired image, as shown in 
Figure 6B. 

Figure 6C shows a geometrical diagram. The points b and t s 
can be triangulated to obtain the position in space of pencil 
points B and T using the known the height of the pencil H. 
Effectively, the coordinates of the pencil tip T can be directly 
inferred from B. As shown, this is done by intersecting optical 
rays (Oc,b) and (0c,t s ) with the n h that is known from camera 
calibration. The point light source S therefore must lie in the 
plane A = (T, T s ) in space. This by itself yields a first linear 
constraint on the position of the light source. 

By taking a second view, with the pencil at different 
locations on the plane, a second independent constraint with 
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another line A prime can be obtained. A closed form solution 

for the 3-D coordinates of S may then be derived by intersecting 
the two lines A and A prime using a least squared combination. 

Two views may provide sufficient information. However 
since the problem is linear, the estimation can be made more 
accurate by obtaining more than two views. 

Operation 305 in Figure 3 requires determining lines of 
intersection between the shadow plane II of T. and the two planes 
n of h and n of v. This may be done by the spatial processing 
of localizing edges of the shadow that are directly projected 
onto orthogonal planes at each discrete time t to form the set 
of all shadow planes n of T. Temporal processing also, and also 
estimating the time t s as the shadow time, where the edge of the 
shadow passes through any given pixel x c =x c ,y c in the image. 

These processing tasks allow finding the edge of the 
shadow. However, the search domains for the spatial and 
temporal processing tasks may be different. The spatial task 
operates on the spatial coordinates or image coordinate system, 
while the temporal task operates on the temporal coordinates 
system. One aspect of the present system enables making the two 
search procedures more compatible so that at any time to when 
the shadow passes through a pixel x Cf the two searches still find 
the exact same point (x, y, tO) in space-time. 
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A technique is described herein called spatio-temporal 

thresholding. This thresholding is based on the observation 

that, as the shadow is scanned across the scene, each pixel x,y 

sees its brightness intensity going from, an initial maximum 

value Imax down to a minimum value l m i n - The pixel then returns 

to its initial value as the shadow goes away. This profile is 

characteristic even when there is a non-neglible amount of 

internal reflections in the scene. 

For any given pixel x c = (x,y), define I min (x,y) and 
Imax(x,y) as its minimum and maximum brightness throughout the 
entire sequence: 

min 
max 

4ax(*>j0= t {i(x,y,t)} 

The shadow is defined as to be the locations (in space- 
time) where the image I(x,y, t) intersects with the threshold 
image I S hadow (x, y) ; defined as the mean value between I ma x(x,y) and 
lmin(x,y) : 

Lhado, <X y) = ^ (4ax (*> y) + 4* 0> y)) 



This may be also regarded as the zero crossings of the 
difference image AI(x,y,t) defined as follows: 
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AI(x,y,t) = I(x,y,t)-I shadow (x,y) 

The concept of dual space may assist in certain 
calculations . 

SHADOW PLANE ESTIMATION II (t) 

Denote by a (t) , I v (t) and I v (t) the coordinate vectors of 
the shadow plane n(t) and of the shadow edges A h (t) and X v (t) at 
time t. Since X h (t) is the projection of the line of 
intersection A h (t) between n(t) and n h , then (t) lies on the line 
passing through m h with direction I h (t) in dual-space (from 
proposition 1 of section 2.2.2). That line, denoted A h (t), is 
the dual image of A h (t) in dual-space as described above (see 
section 2.2.2). Similarly, co (t) lies on the line A v (t) passing 
through o) v with direction A v (t) (dual image of A v (t)). 
Therefore, in dual-space, the coordinate vector of the shadow 
plane 0) (t) is at the intersection between the two known lines 
A h (t) and A v (t) . In the presence of noise, these two lines might 
not exactly intersect (equivalently, the 3 lines X ir A h (t) and 
A v (t) do not necessarily intersect at one point on the image 
plane, or their coordinate vectors I ir I h (t) and I v (t) are not 
coplanar) . However, one may still identify a> (t) with the point 
that is closest to the lines in the least-squares sense. The 
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complete derivations for intersecting a set of lines in space 

may be found in section 4.1.2. When interesting the two lines 

A h (t) and A v (t) in space , the solution reduces to: 



^(0=^(0+^(0), 



with 



gt,(0 = m h +a h A h (t) 
m 2 (t) = m v +a y X v (t) 



where 



a, 



where A is a 2 x 2 matrix and b is a 2-vector defined as follows 
(for clarity, the variable t is omitted) : 



A = 



< Xv,m h -m v > 



Note that the two vectors a>i(t) and 0)2(2) are the 
orthogonal projections, in dual-space, of to (t) onto A h (t) and 
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Av(t) respectively. The norm of the difference between these two 

vectors may be used as an estimate of the error in recovering 
n(t). If the two edges A h (t) and A v (t) are estimated with 
different reliabilities, a weighted least square method may 
still be used. 

Using the additional vertical plane n v enables extracting 
the shadow plane location without requiring the knowledge of the 
light source position. Consequently, the light source is 
allowed to move during the scan. This may allow use of natural 
light, e.g. the sun, for example. 

When the light source is of fixed and known location in 
space, the plane n v is not required. Then, one may directly 
infer the shadow plane position from the line A h (t) and from the 
light source position S: 

m{t) = m h + J h {t) 



where 



S e n(Q o < m(t),Xs > = l^a h = > 

<A h (t),X s > 

(6.10) 

where X S =[X S Y s Y S ] T is the coordinate vector of the light source 
S in the camera reference frame. In dual-space geometry, this 
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corresponds to the intersecting the line A h (t) with plane S, dual 

image of the source point S. Notice that <! h (t) , X s > = 0 
corresponds to the case where the shadow plane contains the 
camera center projection 0 C . This is singular configuration that 

may make the triangulation fail ( || tf> (t) || -» <x> ) . This approach 
requires an additional step of estimating the position of S 
using light source calibration. This reconstruction method was 
used in experiments 2 and 3. 

The quantity 1 - <a> h , X s > reduces to h s /d h where h s and d h 
are the orthogonal distances of the light source S and the 
camera center O c to the plane n h . 

Since a? h is the coordinate vector of plane n h , the vector 

n h = d h 6) h is the normal vector of the plane n h in the camera 
reference frame. Let P be a point in Euclidean space (E) of 

coordinate vector X . The quantity d h - <n h ,X> is then the 
(algebraic) orthogonal distance of P to EE h (positive quantity if 
the point P is on the side of the camera, negative otherwise) . 

In particular, if P lies on n h , then <n h , X> = d h , which is 
equivalent to <a) h ,X> = 1. The orthogonal distance of the light 
source S to n h is denoted h s . Therefore h s - d h - <n h ,X>, or 
equivalently 1 - <a> h , X s > = h 3 /d h . 

According to that claim, the constant a h of equation 6.10 
may be written as: 
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h s ld h \ld h 



<Xh{t\X s > <X h (t\X s lh> 



This expression highlights the fact that the algebra 
naturally generalizes to cases the light source is located at 
infinity (and calibrated) . Indeed, in those cases, the ratio 
X s /h s reduces to </ s /sin ^where d s is the normalized light 
source direction vector (in the camera reference frame) and ^the 
elevation angle of the light source with respect to the n h . In 
dual-space, the construction of the shadow plane vector a> (t) 
remains the same: it is still at the intersection of A h (t) with 
S. The only difference is that the dual image S is a plane 
crossing the origin in dual-space. The surface normal of that 
plane is simply the vector d s . 

TRIANGULATION 

Once the shadow time t s (x c ) is estimated at a given pixel x c 
= [x c y c 1] T (in homogeneous coordinates), the corresponding 
shadow plane n(t s (x c )) is identified (its coordinate vector 
o c = co (t s (x c ) ) ) by triangulation at 320. The point P in space 
associated to x c is then retrieved by intersecting II(t s (jc c )) 
with the optical ray (0 c ,x c ): 



15 



Attorney Docket No. 06618/565001 



Z c = - 1 - => X c =Z c x c = - Xc - , 

<Q)c,Xc> <0)c,X c > 

If X c = [X c Y c Z C ] T is defined as the coordinate vector of P in 
the camera reference frame. 

Notice that the shadow time t s (x c ) acts as an index to the 

shadow plane list n (t) . Since t s (x c ) is estimate at sub-frame 

accuracy, the plane Il(t s (x c )) (actually it coordinate vector a> c ) 
results from linear interpolation between the two planes II (to - 

1) and II (to) if t 0 -l<t s (x c )< to and to integer: 

(Oc = Atco(t 0 -1) + (1 - A/)fi>(f 0 ), 

where At = t 0 -t s ( x c ) , 0 < A t<l . 

Pixels corresponding to occluded regions in the same cannot 
provide substantive information. Therefore, only pixels that 
have a contrast value larger than a predetermined threshold are 
used. In the experiments giving herein, where intensity values 
are encoded between zero and 255, this threshold may be 30. The 
threshold may also be proportional to the level of noise in the 
image . 
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Once the shadow time is estimated at any given pixel, the 

shadow plane can be triangulated from its coordinate vector. 

The point in space (P) associated to the given pixel is then 

retrieved by intersecting the shadow plane with this optical 

ray. This is shown in Figure 2. More specifically, 



/- - \ ^ Xc ~ Z c x < = /- - v » 

\G>c,XcJ (G)c,Xc) 



the shadow time therefore ask as an index to the shadow plane 
list. This effectively provides range data to the actual 
object . 

The recovered range data is used to form a mesh by 
connecting neighborhood points in triangles. Connectivity is 
directly given by the image. Two vertices become neighbors if 
their corresponding pixels are neighbors in the image. 
Moreover, since each vertex corresponds to a unique pixel, the 
texture mapping can be relatively easily carried out. 

Figure 7 shows the global geometrical principal of the 
reconstruction technique in normal Euclidean space (left) and 
"dual space" (right) . This Figure shows the correspondence 
between Euclidean space and dual space for objects, e.g. lines 
and points on the image plane, as well as objects in 3-D which 
include planes, lines and points in space. 
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Observe that the calibration of the vertical plane n v is 

also illustrated in the dual-space diagram: its coordinate 
vector co v is at the intersection of the A ± and the set of plane 
vectors orthogonal to co h (defining a plane in dual-space) . The 
line Ai is at the intersection of the two planes n h and n v , and 
it dual image h± is uniquely defined by the horizontal plane 
vector o> h and the vector ~X ir coordinate vector of the line A t 
observed on the image plane. This calibration process is 
described above. 

Once co v is known, the shadow plane vector co (t) associated 
to the shadow edge configuration at time t is at the 
intersection between the two lines A h (t) and A v (t), dual images 
of A h (t) and A v (t). Those two dual lines are defined by the two 
reference plane vectors co h and a> v and the direction vectors 
Ah(t) and A v (t) (vector coordinate of the two image lines X h (t) 
and A v (t)). This processing step is described in detail. 

The final step consisting of identifying the point P in 
space by intersecting the optical ray (0 c ,p) with the shadow 
plane n is also illustrated on the dual-space diagram. In dual- 
space, that stage corresponds to finding the dual image of P 
that is the unique plane in dual-space containing the point a (t) 
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(shadow plane vector) with orthogonal vector x c (homogeneous 

coordinate vector of the image point p) . 

An alternative embodiment uses a single reference plane (El h 
without n v ) with a calibrated light source is summarized on 
Figure 8. This technique uses a different procedure of 
estimating the shadow plane coordinate co (t) . In that version, 
the vector co (t) is at the intersection of the dual line A h (t) 
and the dual image of the light source S. The triangulation 
operation remains unchanged. 

The point P in space is determined by intersecting the optical 
ray O c , P with the shadow plane II. This corresponds to finding 
the dual image of P that is the unique plane in dual space that 
contains the point including the shadow plane vector with the 
orthogonal vector. This is done by triangulation as described 
above . 

The alternate setup uses a single reference plane with a 
calibrated light source as shown in Figure 8. In the Figure 8 
setup, a different procedure is used to estimate the shadow 
plane coordinate vector. 

The accuracy of this system is not necessarily increased by 
decreasing the scanning speed. However, the scanning speed must 
be sufficiently slow to allow each temporary pixel profile to be 
sufficiently sampled. A sharper shadow edge will require slower 
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scanning so that the temporal profile at each pixel can be 

properly sampled. The reality, however, is that the speed of 
moving the shadow is really limited by the response speed of the 
image acquisition element. 

In this system, range data can only be retrieved for pixels 
that correspond to regions in the scene that are illuminated by 
the light source and imaged by the camera. Better coverage of 
the scene may be obtained from multiple scans of the same scene 
keeping the light source at different locations each time and 
keeping the camera position fixed. 

The previous system has described accurately characterizing 
3-D surface based on projection of shadows on a known reference 
plane. Another embodiment describes operating without using a 
background plane as the reference plane, and instead using a 
calibrated light source. 

A summary of the scanning scenario for a flat reference 
plane is shown in Figure 9. In the first embodiment, the plane 
n d is used as a reference plane for scanning. The position of 
the light source S is known and the position of the plane n d is 
also known from calibration/setup. 

A curved edge shadow £ is projected on the image during 
scanning. All points in space P are estimated as the curved 
edge moves across the image, by estimating from the points p on 
5. n denotes the corresponding shadow plane. The image 930 
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corresponds to that image which is seen by the camera. The 

corresponding points A and B in the scene may be found by 
intersecting n d with the optical rays (O c ,a) and (O c ,b) from the 
image. The shadow plane n may then be inferred from the three 
points in space: S, A and B. 

For a given position of the shadow, once the shadow plane U 
is identified, the 3-D image of the entire shadow edge £ can be 
determined by geometric triangulation using, among other things, 
the known position of the reference plane n d . This embodiment 
describes finding the 3D image, using a similar overall 
technique to that described above, but without knowing EL 

The extension to this embodiment takes into account a 
number of issues about the image. First, depth information at 
two distinct points on an edge propagates along the entire edge. 
Also, if depths Z of two distinct points a and b of this shadow 
edge £ are known, then the depth at any other point in the image 
Z P can also be found. Depth at two distinct points propagates 
over the whole image. 

Let x A and x B be the homogeneous coordinate vectors of the 
two points a and b on the image plane x A = [x A V A 1] T . Then if 
the two depths Z A and Z B are know, then so are the full 
coordinate vectors of the associated points A and B in the 3D 
scene: X A = Z A x A and X B = Z B x B . Therefore the associated shadow 
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plane n is the unique plane passing through the three points A, 

B and S. Once n is recovered, any point p along the edge e may 
be triangulated leading to Z p . 

The depths of two points a and b may be known if they lie 
on the known reference plane n d . Consequently, following 
Property 1, depth information at a and b propagate to every 
point p along the edge C. 

Depths can also propagate from edge to edge. This concept 
is used here to allow the depths to propagate over the entire 
image. According to the present embodiment, knowledge of the 
depth of three distinct points in the image can be used to 
recover the entire scene depth map. Hence, knowledge of these 
three distinct points can be used in place of knowledge of the 
reference plane n d . 

A, b and c are defined as three distinct points in the 
scannable area of the image. For purposes of this explanation, 
it can be assumed their depths Z a , Z B and Z c are known. By 
appropriate shadow scanning, the depth of any point p in the 
image can be obtained. 

This can be demonstrated using a constructive example. A 
shadow edge £l that goes through points a and B can be projected. 
A second shadow edge ^2 can be projected through points a and c. 
This is shown in Figure 10. The two points a and b on 6i are of 
known depth, and therefore the depth of every point along that 
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edge can be similarly computed. Analogously, for e 2 , the depth 

of every point along that edge can be computed. 

For every point p on the image, there can be an edge ^p that 
passes through p and intersects both e x and e 2 at distinct points 
Pi and P 2 which points are each different than a. Since Pi and P 2 
each lie on the two known edges at Gi and G 2 , the depths of %1 
and ^2 must also be known. Therefore, since two points on the 
shadow edge e p . are known, the depth of every point along ^ p may 
also be computed. In particular, the depth of the point p can 
be computed as shown in Figure 10. Moreover, if points between 
the edges intersect, then the depth information can propagate 
from edge to edge. 

As stated above, this means that knowledge of the depth at 
three distinct points in the image becomes sufficient to recover 
the entire scene depth map. To propagate depth information from 
{a,b,c} to p uses intersecting points pi and p2 of the shadow 
edges . 

The system uses edges %. An edge £ is an isolated edge if 
and only if it does not intersect with at least two other edges 
on the image. Depth information can not be propagated to any 
isolated edge from the rest of the edges. Isolated edges, 
therefore, can not be used in this system. 

This embodiment follows the summary flowchart of Figure 11, 
using the setup shown in more detail in Figure 12. In Figure 
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11, a number of summary operations are described, which are each 

described in further detail herein. The hardware of Figure 12 

shows a point source at S, and a straight edge producing 

element, e.g. a stick 1200 being shown in two different 

positions. Each different position projects a shadow onto the 

3-D scene 12. A camera at 1220 receives a view of the image 

plane, and uses that view to reconstruct details of the 3-D 

image . 

At 1110, a set of shadow images is obtained. The shadows 
are extracted, and their intersections are computed. 

Let n ± be the i th shadow plane generated by the stick (i - 

1,..., N) , with corresponding plane vector a>± = [w* w* w* ] T (in 

x y z 

dual space) . For all vectors co t to be well defined, it is 
required that none of the planes U z contain the camera center 0 C . 
Denote by e x the associated shadow edge observed on the image 
plane. The N vectors a> z constitute then the main unknowns in 
the reconstruction problem. Once those vectors are identified, 
all edges can be triangulated in space. Therefore, there is 
apparently a total of 3N unknown variables. However, given the 
scanning scenario, every shadow plane Yli must contain the light 
source point S. Therefore, X s = [x s Y s Z S ] T the light source 
coordinate vector in the camera reference frame (known), 
provides 
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V/ = l JV, (&»Xs} = l 

Equivalently, in dual-space, all shadow plane vectors a} L must 
lie on the plane S, dual-image on the light source point S. One 
may then explicitly use that constraint, and parameterize the 

vectors o) t using a two-coordinate vector u ± = [u* u' ] T such 

x y 

that: 



where G) Q , co s1 , and co s2 are three vectors defining the 
parameterization. For example, if X s * 0, one may then keep the 

last two coordinates of co± as parameterization: w L = [w 1 w Z ] T , 

y z 

picking a>si = [-Y s /X s 1 0] T , ^ s2 = [-Z s /X s 0 1] T and ^ 0 = [1/X S 0 
0] T . Any other choice of linear parameterization is acceptable 
(there will always exist one give that is S *0 C ). In order to 
define a valid coordinate change, the three non-zero vectors co Q , 
co sl , and a> s2 must only satisfy the three conditions (a) <a> 0 , X $> 
= 1, (b) <G) sl ,X s > = 0, (c) o) sl *6) s2 . In dual-space, {a> slf ~a> s2 ) 
(or W) may be interpreted as a basis vector of the plane S and 
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O) 0 as one particular point on that plane. Figure 13 shows the 

coordinate transformation. 

After that parameter reduction, the total number of unknown 

variables reduces to 2N: two coordinates u* and u* per shadow 

x y 

plane n ± . Given that reduced plane vector parameterization 

(called u -parameterization) , global reconstruction can be 
carried out. 

Intersecting points between the edges themselves depth 
information that propagates from edge to edge. These points 
provide geometrical constraints that may be extracted from the 
images. Therefore, a first operation may be carried out by 
studying the type of constraint provided by an elementary edge 
intersection. 

Assume that the two edges s n and e m intersect at the point 
p k on the image (n * m) , and let n n and II m be the two associated 
shadow planes with coordinate vectors co n and co m as shown in 
Figure 12. Let x k be the homogeneous coordinate vector of p k on 
the image plane, and Z k the depth of the corresponding point P k 
in the scene. The two edges e n and e m intersect in space at P k 
if and only if the planes n n and n m and the scene surface 
intersect at a unique point in space (P k ) . Equivalently, the 
depth Z k may be computed by triangulation using either plane Il n 
or n m . This condition translates into the two constraint 
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equations Z k = l/<a) n , x k > = l/<a* m , x k > in dual-space which can be 

rewritten as follows: 



{Xk.COn ~OJ m J = 0 

This unique equation captures then all the information that 
is contained into an elementary edge intersection. There is a 
very intuitive geometrical interpretation of that equation: Let 
A k be the line of intersection between the two planes n n and II m , 
in space, and let A k be the perspective projection of that line 
onto the image plane. Then, the vector A = 0) n -co m is one 
coordinate vector of the line A k . Therefore, equation 7.3 is 
merely <x k ,/t k > = 0, which is equivalent to enforcing the point p k 
to lie on A k (see Figure 12) . This equation has both advantages 
of (a) not explicitly involving Z k (which may be computed 
afterwards from the shadow plane vectors) and (b) being linear 
in the plane vectors unknowns o) n and a> m . The same constraint 
may also be written as a function of u n and u m , the two u- 
parameterization vectors of the shadow planes n n and n m : 





-0 



where y k = W T x k (a 2-vector) . Notice that this new equation 
remains linear and homogeneous in that reduced parameter space 
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Let N p be the total number of intersection points pk (k = 
1,...,N P ) existing in the edge-web (the entire set of edges). 
Assume that a generic p k lies at the intersection of the two 
edges e n(k) and e mW (n{k) and m(k) are the two different edge 
indices) . 

The total set of constraints associated to the N p 
intersections may then be collected in the form of N p linear 
equations : 



which may also be written in a matrix form: 

au = o Np 

where 0 Np is a vector of N p zeros, A is an N p x 2N matrix 
(function of the j/ k coordinate vectors only) and U is the 

- - T 

vector of reduced plane coordinate (of length 2N) : U = [w 

-T 112 2 — 

u ...] T = [u u u u ...] T . The vector U = {z/ik K - 
2 x y x y 

According to this equation, the solution for the shadow plane 
vectors lies in the null space of the matrix A. The rank of 
that matrix or equivalently the dimension of its null space are 
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identified at 1120, isolated edges and isolated group of edges 

are rejected. This forms an edge web which is fully connected, 

resulting in a set of an edges ^ and N P intersection points p K . 

An edge-web is fully connected if and only if it cannot be 
partitioned into two groups of edges which have less that two 
(zero or one) points in common. In particular a fully connected 
edge-web does not contain any isolated edge. Notice that under 
the condition only, depth information can freely propagate 
though the entire web. A normal scanning scenario is then 
defined as a scenario where the edge-web is fully connected and 
the total number of intersections is larger than 2N (this last 
condition will be relaxed later on) . 

In a normal scanning scenario, the rank of the matrix A is 
exactly 2N-3 (or alternatively, the null space of A is of 
dimension 3) . 

This is because in a normal scanning scenario, the 
reconstruction problem has at most three parameters. 
Consequently the dimension of the null space of A is at most 3, 
or equivalently, A is of rank at least 2N-3 in fact, usually 
exactly 2N-3. At 1130, a unitary seed vector U 0 is calculated. 
The scene factor is calculated using singular value 
decomposition or SVD. 

For every solution vector U = {u ± } to the equation, there 
exists three scalars a, p and X such that: 
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_0 - 

Ui = yu m + uo 



which are also obtained at 1130 with u 0 = [a p] T . Conversely, 
for any scalars a, (3 and g, the vector U = {wi} given by the 
equation is solution of a linear system. The vector U° is 
called a "seed" solution from which all solutions of the linear 
system may be identified. 

Any non-trivial solution vector U° = { w ° } may be used as 



seed as long as it is one solution of the equation the solution 
vector (called the unitary seed vector) that satisfies the two 



non trivial solution meaning that all u x cannot be identical. 
The unitary seed vector U° satisfies the linear equation BU° = 
6 n 2-o, where b is the following augmented (N p -2) x 2N matrix: 



extra normalizing conditions: (a) 




sl M.=0(zero mean), (b) 




= 1 (unit norm) may be used. Those conditions assure a 



A 



1 0 1 0 ... 1 0 
0 1 0 1 ... 0 1 
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The last two rows of B enforce the zero-mean constraint, 

bringing the rank of B to 2N-1 (=(2N-3) + 2). Therefore, the 

dimension of null space of B is one, leading to U° being the 
unitary eigenvector associated to the unique zero eigenvalue of 
B. Consequently, a standard singular value decomposition (SVD) 

of B allows to naturally retrieve U°. Such a decomposition 
leads to the following relation: 



B = USV T 



where U and V = [Vi V 2 ... V 2 n] are two unitary matrices of 
respective sizes (N p +2) x (N p +2) and 2N x 2N, and S is the (N p +2)x 

2N matrix of singular values. The unitary seed vector U° is 
then the column vector of V associated to the zero singular 
value. Without loss of generality, assume it is the last column 

vector: U = {u } = V 2N - Alternatively, one may retrieve the 

/ 

same V matrix by applying the same decomposition on the smaller 
2N x 2N symmetric matrix C = B T B. Such a matrix substitution is 
advantageous because the so-defined matrix, C, has a simple 
block structure: 
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C 



C 2l c 



2,2 



c 



2,N 



c c 



c 



N,N 



where each matrix element dj is of size 2x2. 

Two shadow edges can only intersect once (two shadow planes 
intersect along a line that can only interest the scene at a 
single point). All matrices C±j have the following expressions: 



kiij) 
T 



C i,i =I 2 - ^Yk(i,n)Y 



where I 2 is the 2x2 identity matrix. Observe that, every off- 
diagonal matrix element Cij(i^j) depends only on the intersection 
point pk(ij) between edges Gi and Gj . Every diagonal block C ifi 
however is function of all the intersection points of e x with the 
rest of the edge-web (the sum is over all points Pk{i,n), for n = 
(1,. . .,N). 

Once the C matrix is built, V is retrieved by singular 
value decomposition (1130; Figure 11). This technique allows 
then for a direct identification of the unitary seed solution of 
-o "° 

U = {u t } leading the set of all possible solutions of the 
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equation. Euclidean reconstruction is thus achieved up to the 
three parameters a, (3 and A. 

-0 

Once the seed solution W = { u } is found (by SVD) , one may 

i 

identify the final "Euclidean" solution U = {uj} at 1140 if the 
depth of (at least) three points in the scene are known. 
Without loss of generality, assume that these points are p k for k 
= 1,2,3 (with depths Z k ) . Those points provide then three linear 
equations in the unknown coefficient vector a=[oc, (3 y ] T 



for k = 1,2,3 resulting into a linear system of three equations 
and three unknowns. This system may then be solved, yielding 
the three coefficients a, (3 and A, and therefore the final 
solution vector U. Complete Euclidean shape reconstruction is 
thus achieved. If more points are used as initial depths, the 
system may be solved in the least squares sense (once again 
optimal in the inverse depth error sense) . Notice that the 
reference depth points do not have to be intersection points as 
the equation contends. Any three (or more) points in the edge- 
web may be used. 

While the above has described strict reconstruction, other 
clues about the scene may also be used. These clues may include 
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planarity of portions on the scene, angles between the different 

planes, or mutual distances between points in space. Any of 
these geometric clues can help either simplify the calculations, 
or make the calculations more strictly Euclidean. 
Another embodiment, shown in Figure 14, generalizes the above 
operation to use with multiple light sources. Using the 
notation above, two calibrated light sources may be used. In 
this case, the matrix A has a rank that is increased to 2N-2 . 
To shadow planes from the two light sources SI and S2 can 
intersect at least at two points in the scene. As described 
above, depth information at those two points can then propagate 
along the two edges. They also propagate to the rest of the 
edge web as described above. 

Two shadow planes Iln and Urn are generated from the two 
light sources as shown in Figure 14. Two corresponding shadow 
edges ^n and ^m intersect on the image plane at the two points P. 
Depth information at P k and qk propagates through the shadow 
edges for and am. This provides further information which can be 
used for further detection of information in the scene. 
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B-Dual- Space geometry 



The notation in the foregoing refers to "Dual Space" and "B- 
shape". The meaning of this notation is further described 
herein. All definitions are given in Euclidean space as well as 
in projective geometry. A mathematical formalism called B-dual- 
space geometry is derived from projective geometry. This 
formalism enables us to explore and compute geometrical 
properties of three-dimensional scenes with simple and compact 
notation. This will be illustrated in the following chapter when 
applying that formalism to the problem of camera calibration. 

Let (E) be the 3D Euclidean space. For a given position of 
a camera in space we define T = (0 c9 X c , Y c , Z c ) as the standard frame 
of reference (called "camera reference frame") where O c is the 
camera center of projection, and the three axes (0 C9 X c ) 9 (0 C , Y c ) 
and (0 c9 Z c ) are mutually orthogonal and right-handed ((0 c9 X c ) and 
(O c ,Y c ) are chosen parallel to the image plane). 

We may then refer to a point P in space by its 
corresponding Euclidean coordinate vector X=[X Y zf in that 
reference frame T . The Euclidean space may also be viewed as a 
three-dimensional projective space V 3 . In that representation, 
the point P is alternatively represented by the homogeneous 4- 
vector X^[X Y Z if . The sign ~ denotes a vector equality up 
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to a non-zero scalar. Therefore, any scaled version of 
[X Y Z if represents the same point in space. 

A plane II in space is defined as the set of points P of 
homogeneous coordinate vector X that satisfy: 

(*r,x) = 0 (2.1) 

where tt ~ \k x 7c y n z 7r t J is the homogeneous 4-vector parameterizing 
the plane II ({.} is the standard scalar product operator). 
Observe that if w is normalized such that ttI+ttI+ttI =L then 

^z'f is the normal vector of the plane II (in the camera 
reference frame T) and J 7r =-7r < its orthogonal (algebraic) 
distance to the camera center 0 C 

Image plane and perspective projection 

Let (/) be the 2D image plane. The image reference frame is 
defined as (c,x c ,y c ) where c is the intersection point between 
(0 C ,Z C ) (optical axis) and the image plane, and (c,x c ) and (c, y c ) are 
the two main image coordinate axes (parallel to (0 C , X c ) and 

(Oc>Y c )) ■ Tne point c is also called optical center or principal point. 

Let p be the projection on the image plane of a given point 
P of coordinates X=[X Y zf, and denote x = [x yf 
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its coordinate vector on the image plane. Then, the two vectors 

X and x are related through the perspective projection 
equation: 



(2.2) 





X 


_ l 


X 


X = 










y 


~ z 


Y 



This projection model is also referred to as a "pinhole" camera 
model . 

In analogy to Euclidean space, it is sometimes useful to 

view the image plane as a two-dimensional projective space V 2 . 
In that representation, a point p on the image plane has 

homogeneous coordinate vector x^[# y l] T . Similarly to , any 

scaled version of [x y l] T describes the same point on the image 
plane . 

One advantage of using projective geometry is that the 
projection operator defined in equation 2.2 becomes a linear 

operator from V 3 to V 2 . 



x ^ PX with 



(2.3; 



where X and x are the homogeneous coordinates of P and p 
respectively, 7 3x3 is the 3x3 identify matrix and 0 3xl is the 
3x1 zero-vector. Observe from equation 2.3 that x is equal (up 
to a scale) to the Euclidean coordinate vector X =[X Y Z] of Pi 

x^X (2.4) 
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Therefore x is also referred to as the optical ray direction 
associated to P . 

A line A on the image plane is defined as the set of points 
p of homogeneous coordinate vectors x that satisfy: 

(A,x) = 0 (2.5) 

- r i T 

where X =^X X \ X z is the homogeneous 3-vector defining the line 
A. Observe that if A is normalized such that A*+A?=l, then 
\] is the normal vector of the line A (in the image 

reference frame) and d x =-X z its orthogonal (algebraic) distance 
to the principal point c . 

Claim 1: Let p x and p 2 be two distinct points on the image plane 
with respective homogeneous coordinate vectors x x and x 2 . Then, 
it is straightforward to show that the line A connecting p x and 
p 2 has homogeneous coordinate vector A ~ Xj x x 2 , where x is the 
standard vector product operator in R 3 . 

Claim 2: Let Aj and A 2 be two distinct lines on the image plane 
with respective homogeneous coordinate vectors \ and A 2 . Then, 
the point of intersection p between the two lines X x and A 2 has 
homogeneous coordinate vector x^A { x A 2 . If the two lines are 
parallel then the last coordinate of x is zero. In that case p 
is a point at infinity. 
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There exists useful relations between lines on the image 
plane and planes in space, as illustrated by the two following 
examples . 



Example 1: Consider a line A on the image plane of coordinate 

iT 7 



vector A = 



K \ -\z] • Then, the set of points P in space that 



project onto A is precisely the plane II A spanned by A and the 



camera 0 C . Let tt x be the coordinate vector of II . Let 



us 



compute tt a as a function of A . According to equations 2.3 and 

2.5, a point P of homogeneous coordinate vector X will lie on 
II A if and only if: 



(a,px) = 



0 



(2.6) 



this relation enforces the projection of P to lie on A. This 
may be alternatively written: 

(p T A,x) = 0. (2.7) 
Therefore the plane coordinates tv a has the following expression 



7I\ 



P r A = 



A 
0 



\ 
A. 



(2.8) 



Example 2: Consider two planes in space II, and II 2 of respective 
coordinate vectors tt, ^ ^ ^ n^J and ^[tt^ ^ ^ ^ J . 
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Assume the two planes intersect along line A in space, and call 
A the resulting image line after projection of A onto the image 
plane. Let us compute the coordinate vector A of A as a 
function of W l and W 2 . Consider a point P on A and denote p 
its projection on the image plane. Since P lies on II 1 and U 2 , 

its homogeneous coordinate vector X~[x Y Z l] T must satisfy the 
following system: 



■7r x X + 7v y Y + 7r Zi Z + 7r tx = 0 (I,) 
This system yields: 



(2.10) 



Since the homogeneous coordinate vector of p is x~X = [X Y Z] T , 
equation 2.10 reduces to a standard image line equation: 

(A, x} = 0 (2.11) 



where 



A~ 



7T, 7V — 7V f 7T 



(2.12: 
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that is the coordinate vector of A, projector of A = n i nll 2 . 

Rigid body motion transformation 

Consider a set on N points P l in space ( % = l,... 9 N ) , and let 

X%-\X l Y i be their respective coordinate vectors in the 

camera reference frame T . Suppose the camera moves to a new 
location in space, and let Xl =[X} %' Z'J be the coordinate 
vectors of the same points P % in the new camera reference frame 

T f . Then X l and X[ are related to each other through a rigid 
body motion transformation: 

Vi = (l,...JV), Xt=RX t +T (2.13) 

Where ReS0(3), which is a special orthogonal 3x3 matrix, and T 
are respectively a 3x3 rotation matrix and a 3-vector that 
uniquely define the rigid motion between the two camera 
positions. The matrix R is defined by a rotation vector 

0 = Q y ftj r such that: 

R = e nA (2.14) 
where ClA is the following skew- symmetric matrix: 
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o -n z n y 
a o -n, 



(2.15) 



Equation 2.14 may also be written in a compact form using the 
Rodrigues 1 formula : 



j R = 7 3>< 3COs(6') + [Qa] 



sin(0) 

0 



l-cos(fl) 

e 2 



(2.16) 



where # = ||Q||, and is the following semi-positive definite 

matrix : 



n y n x n 2 y n y ti z 



(2.17) 



The fundamental rigid body motion equation 2.13 may also be 
written in projective space V 3 . In V 3 , the point P t has 

homogeneous coordinate vectors X« [X t Y l Z % l] T and 

X»' ^X' t Y/ Z[ if in the first (f) and second (f) reference frames 
respectively. Then, equation 2.13 may be written: 



X* ~£>X* with D = 



R T 
0,x 3 1 



(2.18) 



where 0 lx3 is a 1x3 zero row vector. Observe that the inverse 
relation may also be written as follows: 
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X*~LHX with Dr l 



R T -R T T 



0 



1x3 



1 



(2.19) 



Let pi be the projection of P % onto the second camera image 
plane, and let x[ be the homogeneous coordinate vector. Then, 
following the equation 2.3, we have: 



(2.20) 



which may be also written: 



x' ~ FX 



(2.21) 



where: which may be also written: 



P' = PZ) = 



R T 



(2 .22) 



The matrix P' is the projection matrix associated to the second 
camera location. 

Consider now a plane II of homogeneous coordinate vectors 7f 

and 7f' in both camera reference frames J 7 and !F . How do n and 

7f' relate to each other? Consider a generic point P on II with 

homogeneous coordinate vectors X and X in both reference 
frames. According to equation 2.1, we have: 
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(2.23) 



which successively implies: 



7f,Z)- 1 X/) = 0 



d- t ttX ) = o 



(2.24) 



(2.25) 



Therefore : 



7r f ~ Z)~ r 7r = 



i2 0 



3x1 
1 



7T 



(2.26) 



Similarly, the plane coordinate vector before motion 7r may be 
retrieved from 7f' through the inverse expression: 



7f - Z^tt ' = 



7* 



ft 



3x1 
1 



7T 



(2 .27) 



In order to put in practice these concepts, let us go 
through the following example: 



Example 3: In the second reference frame T f (after camera 

motion) , consider a line A' on the image plane, and the plane II 
that this line spans with the camera center (similarly to example 

1 . Let A' and 7f' be the homogeneous coordinate vectors of A' 
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and IT in T' . Let us compute 5r , the coordinate vector of n in 

the initial camera reference frame T (before motion) as a 

function of A', R and T . 

According to equation 2.8, W and A' are related through the 
following expression: 



7T ~ 



A' 
0 



[2.28) 



Then, 7r may be calculated from using equation 2.27: 



7T ~ Z) t tt' = 



R T o 3xl " 




A' 




R T X' 






0 




_(T,A')_ 



(2.29) 



where P' is the projection matrix associated to the second camera 
location (eq. 2.22). Observe the similarity between equations 
2.8 and 2.29. 



Definition of B-dual-space 

As presented in the previous section, a plane II in space is 



% \ k 2 ?rj in the 



represented by an homogeneous 4 -vector 7r: 

camera reference frame T={§ c , X C ,Y C , Z^j (see equation 2.1). 
Alternatively, if II does not contain the camera center 0 C 
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(origin of T ) then it may be represented by a 3 -vector 



■x U y u z 



, such that 



(w,jst)=: 



(2.30) 



for any point P e II of coordinate vector X = X Y zj in T . 
Notice that u = njd„ where n„ is the unitary normal vector of the 
plane and d n *0 its distance to the origin. Let (Q) = R 3 . Since 
every point O e (fl) corresponds to a unique plane n in Euclidean 
space (E) , we refer to (0) as the ^plane space' or ^-dual- 
space' . For brevity in notation, we will often refer to this 
space as the dual-space . There exists a simple relationship 

between plane coordinates in projective geometry and dual -space 
geometry: 



uj = — 







1 

















if 7T t *0 



(2.31) 



In that sense, dual -space geometry is not a new concept in 
computational geometry. Originally, the dual of a given vector 
space (E) is defined as the set of linear forms on (E) (linear 
functions of (E) into the reals R ) . In the case where (E) is 
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the three dimensional Euclidean space, each linear form may be 
interpreted as a plane II in space that is typically 

parameterized by a homogeneous 4 -vector 7f ~[tt x ir y tt z 7r t *J . A point 
P of homogeneous coordinates X = \X Y Z 1 lies on a generic 

plane II of coordinates tt if and only if ^7f ? X^=0 (see [13]). 

Our contribution is mainly the new cU-parameterization. We will 
show that this representation exhibits useful properties allowing 
us to naturally relate objects in Euclidean space (planes, lines 
and points) to their perspective projections on the image plane 
(lines and points) . One clear limitation of that representation 
is that plane crossing the camera origin cannot be parameterized 
using that formalism (for such planes 7r t = 0), However, this will 
be shown not to be a critical issue in all geometrical problems 
addressed in this thesis (as most planes of interest do not 
contain the camera center) . 

Properties of B-dual space 

This section presents the fundamental properties attached to 
dual - space geometry . 

The following proposition constitutes the major property 
associated to our choice of parameterization: 

Proposition 1: Consider two planes n a and II 6 in space, with 

respective coordinate vectors to a and 4(^*4) in dual-space, and 
let A = n a nll 6 be the line of intersection between them. Let A 
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be the perspective projection of A on the image plane, and A 
its homogeneous coordinate vector. Then A is parallel to u) a -u^. 
In other words, O a -u b is a valid coordinate vector of the line 
A . 

Proof: Let P 6 A and let p be the projection of P on the 

image plane. Call X = [x Y zj and x~±X the respective 
coordinates of P and p. We successively have: 

P e u b 

= i 

=> ^d; o -q,x^ = 0 (since Z^O) 

Notice that this relation is significantly simpler than that 
derived using standard projective geometry (equation 2.12). 

In addition, observe that the coordinate vector u) of any 
plane IT containing the line A lies on the line connecting 

° a and <4 in dual-space (Q) . We denote that line by A and call 
it the dual image of A. The following definition generalizes that 
concept of dual image to other geometrical objects: 



PeA <S> 
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Definition: Let A be a sub-manifold of (E) (e.g., a point, 

line, plane, surface or curve) * The dual image of A of A is 
defined as the set of coordinates vectors O in dual-space (Q) 

representing the tangent planes to A . Following that standard 
definition (see [13, 14]), the dual images of points, lines and 
planes in (E) may be shown to be respectively planes, lines and 
points in dual-space (f2) . Further properties regarding non- 
linear sub-manifolds may be observed, such as for quadric 
surfaces in [15] or for general apparent contours in space in 
[16] . 

The following five propositions cover the principal 
properties attached to the dual-space formalism. 

Proposition 2 - Parallel Planes - Horizon Line: Let II a and 

U b be two parallel planes of coordinates O a and . Then O a is 
parallel to u\ . 

Proof: The planes have the same surface normals n a =n b . 

Therefore, the proposition follows from definition of Uj . 

The horizon line H represents the "intersection" of two 

A 

planes at infinity. The dual image of H is the line H 
connecting u) a and u) h and crossing the origin of the (Vt) space. 
The direction of that line is not only the normal vector of the 
two planes n a — n b , but also the representative vector X H of the 
projection X H of H (horizon line) on the image plane (according 
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to proposition 1) . Although H is not a well-defined line in 
Euclidean space (being a line at infinity) , under perspective 
projection, it may give rise to a perfectly well defined line A j 
on the image plane (for example a picture of the ocean) . Once 
that line is extracted, the orientation of the plane is known: 

u a - w h ~ \ H (2.32) 



Proposition 3 - Orthogonal Planes: If two planes II a and II 6 are 

two orthogonal, then so are their coordinate vectors O a and £J b 
in dual-space. Consequently, once one of the plane O a is known, 

then u) b is constrained to lie in the sub-space orthogonal to O a , 
a plane in dual -space. 

Proposition 4 - Intersecting lines: Consider two lines A a 

and A b intersecting at a point P , and call II the plane that 

contains them. In dual -space, the two dual lines A a and A 6 

necessarily intersect at (D the coordinate vector of n (since O 
is the plane that contains both lines) . Similarly, the dual 

image P of P is the plane in dual -space that contains both dual 

lines and A b . Notice that P does not cross the origin of 
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Proposition 5 - Parallel lines - Vanishing Point: Consider 

two parallel lines A a and A b belonging to the plane II of 
coordinates CD . Then CD is at the intersection of the two dual 

lines A a and A b . In dual-space, the plane containing both dual 
lines A a and A 6 is the dual image of V of the vanishing point V , 
i.e., the intersection point of A a and A b in Euclidean space. 
If H is the horizon line associated with II, then V € H , which 
translates in dual-space into H EV . Since H contains the 
origin, so does V . Notice that once the perspective projection 

v of V is observable on the image plane, the plane V is 
entirely known (since its orientation is the coordinate vector of 
v) . 

Proposition 6 - Orthogonal lines: Let A x and A 2 be two 

orthogonal lines contained in the plane II of coordinates CD) and 

let CD = A x D A 2 . Consider the set of planes orthogonal to II . 
In the dual -space, that set is represented by a plane containing 
the origin, and orthogonal to CD (see proposition 3) . Call that 

plane V (it can be shown to be the dual image of a vanishing 
point) . In that set, consider the two specific planes IIj and II 2 

that contain the lines A { and A 2 . In the dual-space, the 
representative vectors a?! and CD 2 of those two planes are defined 
as the respective intersections between V and the two lines A { 

A 

and A 2 . Then, since the two lines A] and A 2 are orthogonal, the 
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two vectors uj x and u) 2 are also orthogonal. This implies that 
the images of the two vanishing points VJ and V 2 associated to 
the lines Aj and A 2 are orthogonal in the dual -space. Two 
vanishing points are enough to recover the horizon line H 
associated with a given plane II in space. Therefore, observing 
two sets of parallel lines belonging to the same plane under 
perspective projection allows us to recover the horizon line, and 
therefore the orientation of the plane in space (from proposition 
2) . The horizon line H corresponding to the ground floor II is 
recovered from the two vanishing points Vj and V 2 . 

2.2.3 Geometrical problems solved in B-dual-space 

This section presents several useful geometrical problems solved 
using dual -space geometry. 

Example 4: Let II be a plane in space of coordinate vector Q . 

Let P be a point on II with coordinate vector X = [X Y Z] T . 
Let p be the projection of P onto the image plane, and denote 
x — [x y \] T its homogeneous coordinate vector. The 
triangulation problem consists of finding the point P from its 
projection p and the plane II, or calculating X from x and cD . 
Since P lies on the optical ray (0 C9 p) , its coordinate vector 
satisfies X ~ x or equivalently , X = Zx , with x = [x y if . in 
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addition, since P lies on the plane II , we have (v 7 X) = 1 . This 
implies : 

1 X = — ^— (2.33) 



fax) (u) 9 x) 
This is the fundamental triangulation equation between a ray and 
a plane in space. 

Example 5: Consider two camera frames T and and let {R>T} 

be the rigid motion parameters between T and T 1 . Let IT be a 
plane in space of coordinate vectors O and u;' in T and T 1 
respectively. How do O and O f relate to each other? 

Consider a generic point P on II of coordinate vectors X 

and ~X in T and T 1 respectively. Then, X = RX + T . Since 
Pell, we may write : 

{<3\X f ) = 1 (2.34) 
(O f ,RX + T) = 1 (2.35) 
{R T Cu',X} = 1 - {Q',T) (2.36) 



" , ,X ) = 1 if &,T) * 1 



(2.37) 



Therefore : 



O - ^ x ■ (2.38) 

1 - (uf,T) 
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This expression is the equivalent of equation 2.27 in dual-space 
geometry. Notice that the condition (o\T) ^1 is equivalent to 
enforcing the plane II not to contain the origin of the first 
camera reference frame T . That is a necessary condition in 
order to have a well defined plane vector Q . The inverse 
expression may also be derived in a similar way: 

Uf = ^r-r. (2.39) 

In that case, the condition (lj,-R t T) ^ 1 constraints the plane II 
not to contain the origin of the second reference frame T f (in 
order to have a well defined vector Q* ) . 

In some cases, only one of the two plane vectors Q or (D f is 
well-defined. The following example is one illustration of such 
a phenomenon. 

Example 6: Consider the geometrical scenario of example 5 where 

the plane II is now spanned by a line A' on the image plane of 
the second camera reference frame T f (after motion) . In that 
case, the coordinate vector O' is not well defined (since by 
construction, the plane II contains the origin of ) . However, 
the plane vector CO may very well be defined since II does not 
necessarily contain the origin of the first reference frame T . 
Indeed, according to equation 2.29, the homogeneous coordinate 
vector 7r of II in T is given by: 
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7T 



R T \' 
{T,\>) 



(2.40) 



where A' is homogeneous coordinate vector of the image line A' in 

T' . Then, according to expression 2.31, the corresponding dual- 
space vector O is given by: 

R T X' 



LO = 



<r,A') 



(2.41) 



which is perfectly well-defined as long as II does not contain 
O c , or equivalently if {T,\*} ^ 0. This condition is equivalent 
to enforcing the line not to contain the epipole on the image 
plane attached to camera frame. The point is the projection of 
onto the image plane attached to the second camera reference 
frame . 



Example 7: triangulation of an optical ray (0 C9 p) with the plane 

II spanned by the line A ; in the other camera reference frame 

T* . Let X — [X Y Z] T be the coordinates of P in space and 

x = [x y l] T the coordinates of its known projection p on the 
image plane. Equation 2.41 provides then an expression for the 
coordinate vector Q of II in frame T : 

a -B> 

where A' is the homogeneous coordinate vector of the image line 
A ; in T* . The triangulation expression given by equation 2.40 
returns then the final coordinate vector of P : 
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Observe that the plane II is not allowed to cross the origin of 
the initial reference frame T , otherwise, triangulation is 
impossible. Therefore the plane vector 0 is perfectly well 

defined (i.e. , (T,A'> ^ 0) . 
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3.1.1 Camera Calibration in B-dual-space geometry 
The position of a point p in a real image is originally 
expressed in pixel units. One can only say that a point p is at 

the intersection of column p x — 150 and row p y —50 on a given 

digitized image. So far, we have been denoting x= [x y if the 
homogeneous coordinate vector of a generic point p on the image 
plane. This vector (also called normalized coordinate vector) is 

directly related to the 3D coordinates X — [X Y Z] T of the 

corresponding point P is space through the perspective 
projection operator (eq. 2.2). Since in practice we only have 

rp 

access to pixel coordinates p = [p x p y 1] / we need to establish a 
correspondence between p and x (from pixel coordinates to 
optical ray in space) . 

Since the origin of the* image reference frame is at the 

optical center c (or principal point) , it is necessary to know 

the location of that point in the image: c =[c x c y ] (in pixels). 

Let f 0 be the focal distance (in meters) of the camera optics 
(distance of the lens focal point to the imaging sensor) , and 

denote by d x and d y the x and y dimensions of the pixels in the 

imaging sensor (in meters). Let f x =f 0 /d x and f y —f 0 /d y (in 

pixels) . Notice that for most imaging sensors currently 
manufactured, pixels may be assumed perfectly square, implying 
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d x = d y or equivalently f x —f y - In the general case f x and f y may 
be different. 

Then, the pixel coordinates p — [p x p y l] T of a point on the 
image may be computed from its normalized homogeneous coordinates 
x—[x y \] T through the following expression: 

Px — fx % ~^ C x , ^ . 

(3.1) 

Py = JyV + C y 

That model assumes that the two axes of the imaging sensor are 
orthogonal. In the case where they are not orthogonal, the pixel 
mapping function may be generalized to: 

Px = fxZ~af y y + Cx 

(3.2) 

Py - fyV + Cy 

where a is a scalar coefficient that controls the amount of skew 
between the two main sensor axes (if a = 0 , there is no skew) . 
For now, let us consider the simple model without skew (equation 

3.1) . If pis the image projection of the point P in space (of 

coordinates X — [X Y Z] T ) , the global projection map may be 
written in pixel units: 

K = f x (XlZ) + c x 
Py = f,(Y/Z) + c, 

This equation returns the coordinates of a point projected 
onto the image (in pixels) in the case of an ideal pinhole 
camera. Real cameras do not have pinholes, but lenses. 
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Unfortunately a lens will introduce some amount of distortion 
(also called aberration) in the image. That makes the projected 
point to appear at a slightly different position on the image. 
The following expression is a simple first-order model that 
captures the distortions introduced by the lens: 



pinhole projection 



P 



x/z 


Y/Z 


K 




V 




Px 




Py 







X 




v. 



x(\ + k c \af) 



fxK+ C x 

fy b y +c y 



radial distortion 



pixel coordinates 



(3.4) 



where k c is called the radial distortion factor. This model is 
also called first -order symmetric radial distortion model 
("symmetric" because the amount of distortion is directly related 

to the distance of the point to the optical center c ) . Observe 

that the systems (3.4) and (3.3) are equivalent when k c —0 (no 
distortion) . 

Therefore, if the position of the point Pis known in 

camera reference frame , one may calculate its projection onto the 

image plane given the intrinsic camera parameters f x9 f y , c x , c y and 

k c . That is known as the direct projection operation and may be denoted 

p = U(X). However, most 3D vision applications require to solve 
the "inverse problem" that is mapping pixel coordinates p to 3D 

world coordinates [X Y Zf . In particular, one necessary step 
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is to compute normalized image coordinates x = [x y 1] (3D ray 

direction) from pixel coordinates p (refer to equation 3.4). 

The only non-trivial aspect of that inverse map computation is in 

computing the vector a from b . This is the distortion 
compensation step. It may be shown that for relatively small 
distortions, this inverse map may be very well approximated by 
the following equation: 







2 




b 





Experimentally, this expression is sufficiently accurate. 
Camera calibration 

The camera calibration procedure identifies the intrinsic camera 
parameters f x ,f y ,c x ,c y and k c (and possibly a), A standard method 
is to acquire an image a known 3D object (a checker board 
pattern, a box with known geometry...) and look for the set of 
parameters that best match the computed projection of the 
structure with the observed projection on the image. The 
reference object is also called calibration rig . Since the camera 

parameters are inferred from image measurements, this approach is 
also called visual calibration . This technique was originally 

presented by Tsai and Brown. An algorithm for estimation was 
proposed by Abdel-Aziz and Karara (for an overview on camera 
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calibration, the reader may also refer to the book Klette, 
Schluns and Koschan ) . 

Figures 4A and 4B show two examples of calibration images 
when using a planar rig (checker board pattern) . Figure 5 shows 
a 3D rig (two orthogonal checker board patterns) . 

Note that although the geometry of the calibration rig is 
known (i.e., the mutual position of the grid corners in space), 
its absolute location with respect to the camera is unknown. In 
other words, the pose of the calibration pattern in unknown. 
Therefore, before applying the set of equations (3.4) to compute 
the image projection of every corner in the structure, one needs 
to find their 3D coordinates in the camera reference frame. We 
first choose a reference frame attached to the rig {called the 

object frame) in which we express the known coordinates X 0 of 
all the corners P % , (i = . This set of vectors is known since 

the intrinsic rig structure is known. Then, the 

coordinate vector Xl of P % in the camera frame is related X 0 
through a rigid motion transformation: 

Vt = l,...,iV, Xl=R c Xl+T c (3.6) 

where R c and T c define the pose of the calibration rig with 
respect to the camera (similarly to equation 2.13). See figure 
3.2. 

Notice that by adding the calibration object in the scene, 
more unknowns have been added to the problem: R c and T c . Those 
parameters are called extrinsic camera parameters since they are 
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dependent upon the pose of the calibration pattern with respect 
to the camera (unlike the intrinsic parameters that remain 
constant as the rig is moved in front of the camera) . 

Let Clc be the rotation vector associated to the rotation 
matrix R c (see equation 2.13). Then, the complete set of 
unknowns to solve for is: 

• Focal length: f x J y (2 DOF) , 

• Principal point coordinates: c x , c y (2 DOF), 

• Radial distortion factor: k c (1 DOF) , 

• Calibration rig pose: Q c , T c (6 DOF). 

Therefore, the global calibration problem consists of solving for 
a total of 11 scalar parameters (adding the skew coefficient a 
would bring the number of unknowns to 12) . 

Let p % {% — l,...,N) be the observed image projections of the 
rig points P % and let p % = [pi p % y f be their respective pixel 

coordinates. Experimentally, the points p t are detected using 
the standard Harris corner finders. 

The estimation process finds the set of calibration unknowns 
(extrinsic and intrinsic) that minimizes the reprojection error. 
Therefore, the solution to that problem may be written as 
follows : 

r — N — 

{/*, f y , Cy, KMc, T c j - Argmin J]||p t " Tl(R c Xl +T c f (3.7) 
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where R c — e QcA ,n(.) is the image projection operator defined in 

equation 3.4 (function of the intrinsic parameters f x9 f y9 c x , c p and 

k c ) and ||.|| is the standard distance norm is pixel units. This 
non- linear optimization problem may be solved using standard 
gradient descent techniques. However, it is required to have a 
good initial guess before starting the iterative refinement. a 
method to derive closed form expressions for calibration 
parameters that may be used for initialization is presented. 

Apart from numerical implementation details, it is also 
important to study the observability of the model. In other 
words, under which conditions (type of the calibration rig and 
its position in space) can the full camera model (eq. 3.4) be 
estimated from a single image projection. For example, it is 
worth noticing that if the calibration rig is planar (as shown on 
figure 3.1-a) the optical center c cannot be estimated (the two 
coordinates c and c ) . Therefore, in such cases, it is 

X y 

necessary to reduce the camera model to fewer intrinsic 
parameters and fix the optical center in the center of the image. 
Further discussions on the camera model observability are 
presented. 

Closed- form solution in B-dual- space 
geometry 

This section demonstrates how one may easily retrieve closed-form 
expressions for intrinsic and extrinsic camera parameters using 
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the dual-space formalism as a fundamental mathematical tool. The 
method is based on using vanishing points and vanishing lines. 
The concept of using vanishing points for camera calibration is 
not new (most of the related work on this topic may probably be 
found in the art. Therefore, this does not to state new concepts 
or theories on calibration, but rather illustrate the convenience 
of the dual-space formalism by applying it to the problem of 
calibration. We show here that this formalism enables us to keep 
the algebra simple and compact while exploiting all the 
geometrical constraints present in the scene (in the calibration 
rig) . That will also lead us to derive properties regarding the 
observability of several camera models under different 
geometrical configuration of the setup, and types of calibration 
rig used (2D or 3D) . Most related work on that topic only deal 
with simple camera model (unique focal length) and extract the 
extrinsic parameters through complex 3D parameterization (using 
Euler angles) . Other standard methods for deriving explicit 
solutions for camera calibration were presented by Abdel-Aziz and 
Karara and Tsai. These methods are based on estimating, in a 
semi- linear way, a set of parameters that is larger than the real 
original set of unknowns and do not explicitly make use of all 
geometrical properties of the calibration rig. 

The method that we disclose here involves very compact 
algebra, uses intuitive and minimal parameterizations, and 
naturally allows to exploit all geometrical properties present in 
the observed three-dimensional scene (calibration rig) . In 
addition, our approach may be directly applied to natural images 
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that do not contain a special calibration grid (such as pictures 
of buildings, walls, furniture...). 

Once it is computed, the closed-form solution is then fed to 
the non-linear iterative optimizer as an initial guess for the 
calibration parameters. This final optimization algorithm is 
inspired from the method originally presented by Tsai including 
lens distortion (see equation 3.7). The purpose of that analysis 
is to provide a good initial guess to the non-linear optimizer, 
to better insure convergence, and check for the consistency of 
the results . 

We will first consider the case of a calibration when using 
a planar rig (a 2D grid) , and then generalize the results to 3D 
rigs (such as a cube) . In those two cases, different camera 
models will be used. 

3. 2.1. When using a planar calibration rig 

Consider the calibration image shown in figure 4A. Assuming no 
lens distortion (A; c =0) and no image noise, the grid may be 
summarized by its four extreme corners on the image (intuitively, 
one may localize all the inside grid corners from those four 
points by simple perspective warping) . In practice, all points 
will be used in order to be less sensitive to image noise, 
however the principle remains the same. Then, the basic observed 
pattern is a perspective view of a rectangle of known dimensions 
LxW . Without loss of generality, we can also assume that this 
rectangle is a square. The reason for that is that through a 
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similar perspective image warping, it is always possible to 
convert a perspective view of a rectangle into a perspective view 
of a square, given that the dimensions of the original rectangle 

are known (actually, only the ratio W/L is necessary) . 

Figure 15 shows a perspective image of a square ABCD . The 

four points x v x 29 x 3 and x 4 are the coordinate vectors of the 

detected corners of the square on the image plane 

after normalization . This means that the x vectors are computed from 

the pixel coordinates of the points after subtraction of the 

optical center coordinates (c x ,c ) (in pixel) and scaling by the 

inverse of the focal length (in pixel as well) . To model the 
aspect ration in x and y , one can assume two distinct focal 

lengths f x and / in both image directions (to account for non- 
square CCD pixels) . 

In the case of calibration from planar rigs, it is known 

that the optical center position (c x ,c ) cannot be estimated. 

Therefore, we will keep it fixed at the center of the image, and 
take it out of the set of unknowns. The resulting intrinsic 

parameters to be estimated are therefore f x and f . Let 

p. ~ [p x p y \] T (i = l,...,4) be the pixel locations of the corners 

after subtraction of the optical center (in homogeneous 
coordinates) . Then one can extract the x vectors through a 
linear operation involving the focal lengths f x and f : for 

i = l,...,4, 
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o i/f y o 

0 0 1 



(3.8) 



where K is the intrinsic camera matrix containing the intrinsic 
parameters (^and/). Let us now extract the set of independent 
constraints attached to the observation in order to estimate the 
focal lengths (hence the camera matrix K) . 

Figure 15 shows the set of corner points x l on the image 
plane. Following proposition 5 of section 2.2.2, lines 
\(i = l,...,4) are used to infer the two vanishing points V x and V 2 
in order to recover the projection A^ of the horizon line H 
associated to the plane II d . The derivation is as follows: 



Aj — X^ x x 2 
A2 — $2 ^ ^4 



A^ — ^ 
A 4 — X 



V x ~ Aj xA 2 



X H ^V } xV 2 



(3.9) 



where x is the standard vector product in R 3 . Notice that in 
order to keep the notation clear, we abusively used V x and V 2 to 
refer to the homogeneous coordinates of the vanishing points on 
the image plane (quantities similar to the x % * s using homogeneous 
coordinates) . It is important to keep in mind that all 
equalities are defined "up to scale." For example, any vector 
proportional to x x x x x would be a good representative for the same 
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line Aj . The same observation holds for the coordinate vectors 
of the vanishing points and that of the horizon line. 

Yet, the normalized coordinates x % of the corners are not 
directly available, only the pixel coordinates p % . However, all 
x t 's can be retrieved from the p t x s through the linear equation 

3.8. We will use of the following statement whose proof may be 
found in [30] : 

Claim 1: Let K be any 3x3 matrix, and u and v any two 3- 
vectors. Then the following relation holds: 



where K* is the adjoint of K (or the matrix of cofactors of 
K ) . Note that if K is invertible (which is the case here), 
then K* = det (K) (K T )~ l t and consequently K**ocK. 

Using that claim, the camera matrix K (or K* ) may be 
factored out of the successive vector products of equations 3.9, 
yielding: 



where A/\ A|, Aj\ A/, V x p , V* and A| are line and point coordinate 
vectors on the image plane in pixel (directly computed from the 
pixel coordinates p v p 2 ,p 3 and p 4 ) : 



(Ku)x(Kv) = K*(uxv) 



A^ 



K~KK P 



A 2 ~ K*\* 
A 3 ~ K*\? 



V 2 ~KV 2 P 
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V7~A 3 p xA 4 p 



The step of inferring the vanishing points V l and V 2 from the 
pairs of lines {A 15 A 2 } and {A 3 ,A 4 } made use of the fact that ABCD 

is a parallelogram (proposition 5) . Using proposition 6 (in 
section 2.2.2), one naturally enforce orthogonality of the 

pattern by stating that the two vanishing points Vj and V 2 are 

mutually orthogonal (see figure 2.11) : 



V X LV 2 & (KV l p )±(KV 2 p ) 

& (Vff(K T K) (T/) = 0 



(3.10) 



That provides one scalar constraint in the focal lengths f x and 

Jy 



a, a, b, b~ 



r / 



2 1 "1"2 



(3.11) 



where o p a 2> £> 1? & 2 , q and c 3 are the known pixel coordinates of the 

vanishing points V x p and V 2 : ~ [aj cJ T and — [a 2 b 2 c 2 ] T . 
Notice that equation 3.11 constraints the two square focals 
(fx>fy) to l^- e on a fi x ^d hyperbola. Finally, the parallelogram 

ABCD is not only a rectangle, but also a square. This means 
that its diagonals (AC) and (BD) are also orthogonal (see figure 
15) . This constraint is exploited by enforcing the two vanishing 
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points attached to the diagonal V 3 and V 4 to be mutually 
orthogonal (proposition 6) . Those points are extracted from 
intersecting the two diagonal lines A 5 and A 6 with the horizon 

line X H (see figure 15) . Following the same process of 

factoring the K matrix (or K* ) out of every successive vector 
product, one obtains: 

X 5 ^K*X P V 3 ^KV P 
A 6 ~iTA£ V 4 ~KV P 

where V p and Vf are the two pixel coordinates of the vanishing 
points V 3 and V 4 (pre -computed from the pixel coordinates of the 
corner points) : 

Then, the orthogonality of V 3 and V 4 yields (V P ) T (K T K) (V 4 ) = 0 , 
or: 

^ + ^ + c,c 4 =0 (3 . 12) 

Jx Jy 

where a 3 , o 4 , fc 3 , 6 4 , c 3 and c 4 are the known pixel coordinates of V p 
and V/ : V p ^ [a 3 b 3 c 3 f and V p ^ [o 4 b 4 c 4 f . This constitutes a 
second constraint on f x and f y (a second hyperbola in the (f?,fy) 
plane), which can be written together with equation 3.11 in a 
form of a linear equation in u = [«, u 2 ] T = [\f 1/ ' : 
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¥2 


and b = — 


C 1 C 2 ' 


a 3 a 4 






c 3 c 4 



An = b with A — 

If A is invertible, then both focals f x and f y may be recovered 
explicitly: 



u — 



1//, 



fy=<JV^2 



or : 















^3 ^4 ^1 ^2 


Cb^ Ct^ ^3 ^4 



/„ = 

under the condition ^ > 0 and u 2 > 0 . 

If ^4 is not invertible (or a x a 2 b 3 b 4 - a 3 a 4 b x b 2 = 0 ) , then both 
focals (/,/) cannot be recovered. However, if A is of rank one 

x y 

(i.e. it is not the zero matrix), then a single focal length 
model f =f =/ may be used. The following claim gives a 

c x y 

necessary and sufficient condition for A to be rank one: 

Claim 2: The matrix A is rank one if and only if the ■ 

projection of the horizon line is parallel to either the x or 
y axis on the image plane {its first or second coordinate is 
zero, not both) , or crosses the origin on the image plane (its 
last coordinate is zero) . Since the matrix K is diagonal, this 
condition also applies to the horizon line in pixel coordinates 
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Corollary: Since is proportional to the surface normal 
vector n h (from proposition 2 in section 2.2.2) , this degeneracy 
condition only depends upon the 3D orientation of the plane 11^ 

with respect to the camera, and not the way the calibration grid 
is positioned onto it (this is intrinsic to the geometry of the 
setup) . 

In such a rank-one degenerate case, the reduced focal model 
is acceptable. Then both constraint equations 3.11 and 3.12 may 
be written as a function of a unique focal f c and follows: 



C ] C 2 



f = - 



a x a 2 + b x b 2 
a 3 a 4 + b 3 b 4 



C 3 C 4 

which may be solved in a least squares fashion, yielding the 
following solution: 



c x c 2 {a x a 2 + b,b 2 ) + c 3 c 4 (a 3 a 4 + b 3 b 4 ) ^ i3) 

/c Ix Iy v cy 2 +c 2 3 c 2 4 

Alternative estimates may be derived by directly solving for 
either one of the constraint equations (3.11 or 3.12) taking 
f x z= / = / c . This may be more appropriate in the case where one of 

the four vanishing points V k is at infinity (corresponding to 
c k = 0 ) . It is then better to drop the associate constraint and 
only consider the remaining one (remark: having a vanishing point 
at infinity does not necessarily mean that the matrix A is 
singular) . Since the vector X H is parallel to the normal vector 
n h of the ground plane Ii h , this rank-one degeneracy case 
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corresponds to having one of the camera axis X c , Y c or Z c parallel 



Note that if two vanishing points are at infinity, then the 
projection of the entire horizon line, is also at infinity on 
the image plane (its two first coordinates are zero) . This 
occurs only when the calibration plane U k is strictly parallel 

to the image plane (or n h =[0 0 if), which is known to be a 
degenerate case where there exists no solution for calibration. 

In the case where the planar pattern is a rectangle, but not 
necessarily a square (or equivalently , the aspect ratio of the 
rectangle is not known) , then the diagonal constraint is not 
available (equation 3.12). In that case, only equation 3.11 is 
available to estimate focal length. It is therefore necessary to 
use a reduced single focal model f — } = f \ 



This expression will be used in a calibration experiment 
illustrated in figure 3.5. 

Once the camera matrix K is estimated, the normalized 
horizon vector X H ~ K*\$ may be recovered. From proposition 2, 
this vector is known to be proportional to the coordinate vector 
&h of U h (or its normal vector n h ) . Therefore, this directly 
provides the orientation in 3D space of the ground plane. The 
only quantity left to estimate is then its absolute distance d h 



to the calibration plane Ii h . 




(3.14) 
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to the camera center, or equivalently the norm 11^11=1/^. This 
step may be achieved by making use of the known area of the 
square ABCD and applying an inverse perspective projection on 
it (possible since the orientation of 11^ is known) . 

Implementation details: In principle, only the four extreme 
corners of the rectangular pattern are necessary to localize the 
four vanishing points Vj p , Vf, V 3 P and V 4 P . However, in order to be 
less sensitive to pixel noise, it is better in practice to make 
use of all the detected corners on the grid (points extracted 
using the Harris corner finder) . This aspect is especially 
important given that vanishing point extraction is known to be 
very sensitive to noise in the corner point coordinates 
(depending on amount of depth perspective in the image) . One 
possible approach is to fit a set of horizontal and vertical 
lines to the pattern points, and then recover the two vanishing 

points V x p , V£ by intersecting them in a least squares fashion. 
Once these two points are extracted, the position of the extreme 
corners of the rectangular pattern may be corrected by enforcing 
the four extreme edges of the grid to go through those vanishing 
points. The next step consists of warping the perspective view 
of the rectangle into a perspective view of a square (making use 
of the known aspect ration of the original rectangle) . The two 

remaining vanishing points V 3 P and V/ may then be localized by 
intersecting the two diagonals of this square with the horizon 
line \£ connecting Vj p and V£ . Once those four points are 
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extracted, the focal length may be estimated, together with the 
plane coordinate vector u) h following the method described 
earlier (using a one or two focal model) . 

When using a 3D calibration rig 

Let us generalize the results to the case where a 3D rig of the 
type in Figure 5 is used for calibration. Figure 16 shows a 
perspective view of a cube in 3D. From that image, one may 

extract seven vanishing points V X ,V 2 , ... V 7 . Similarly to the case 
of a planar square, this set of points must satisfy five 
orthogonality properties : V\ J_ V 2 , V { _L V$ 9 V 2 -L F 3? V 4 _L V 5 and V 6 _L V 7 . 
Then, similarly to equation 3.10, we can write a set of five 
scalar constraints on the pixel coordinates of the vanishing 
points : 

'V X LV 2 (Vff (K T K) (V 2 P ) = 0 

V x _L V 3 (Vff (K T K) (V 3 P ) = 0 

*V 2 ±V 3 (V 2 p f (K T K) (V 3 P ) = 0 (3.15) 

V 4 ±V 5 =» (V p f (K T K) (V p ) = 0 

V 6 ±V 7 => (V p f (K T K) (F/) = 0 

Where K is the intrinsic camera matrix and 

V p ~ \a % b t c t ] T (i = 1 ? ... ? 7) are the pixel coordinate vectors of the 
vanishing points (see figure 16) . Note that the first three 
constraints in (3.15) enforce mutual orthogonality of the faces 
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of the cube, whereas the last two force the left and fight faces 
(and therefore all the others) to be squares. 

Given those five independent constraints, one should be able 
to estimate a full 5 degrees of freedom (DOF) camera model for 
metric calibration including two focal lengths ( f x and f y in 
pixels), the optical center coordinates ( c x and c y in pixels) and 
the skew factor a (see equation 3.2). In that case, the 
intrinsic camera matrix K takes its most general form [31] : 



K = 



Vfx ®/fx -Cx/fx 
0 Vfy -Cyffy 

0 0 1 



This model matches precisely the notation introduced in equation 

T 

(3.2). Then, the semi-positive definite matrix K K may be 
written as follows: 



K T K = \ 
ft 



J_ 

fx 



1 

a 
-c r . 



a 



« 2 + (fx/fyf -&C X - CyUxIfyf 

-ac x - c y (f x /f y ) 2 f x 2 + c 2 x + c 2 y (fjf y ) 2 



(3.16) 



1 


u 5 


—u 3 






—u 4 


— «3 


—u 4 





(3.17) 



Notice that the vanishing point constraints (3.15) are 
homogeneous. Therefore, one can substitute K T K by its 
proportional matrix fl K T K . Doing so, the five constraints 

rp 

listed in equation 3.15 are linear in the vector u = [u 1 ... u 5 ] . 
Indeed, for (t, j) G { (1, 2) , (1,3), (2,3), (4,5), (6,7)}, we have: 
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[ -c,c } - 6,6, ( a t c 3 + a fa ) (b l c j + bfa) - {afij + ap t ) ]« = 0,0, , 

(3.18) 

Therefore, this set of 5 equations may be written in a form of a 
linear system of 5 equations in variable u : 

Au - b (3.19) 



A= 



A is 


a 


5X5 matrix, and & 


a 5-vector: 




-c,c 2 




(a,c 2 + a 2 c x ) 


(V 2 + Z> 2 c,) 


-(a,6 2 + a 2 &i) 






-C,C3 


6,6 3 


(a,c 3 +a 3 c,) 


(61C3 +63C1) 


-(0163+036,) 




Oja 3 


-c 2 c 3 


6 2 6 3 


(a 2 c 3 + a 3 c 2 ) 


(6 2 c 3 + 6 3 c 2 ) 


-(a 2 & 3 + a 3 6 2 ) 


, & = 


a 2 a 3 


— C4C5 


6465 


(a 4 c 5 + a 5 c 4 ) 


(b 4 c 5 +b 5 c 4 ) 


-(a 4 6 5 + a 5 b 4 ) 




a 4 a5 


-c 5 c 7 


6 6 6 v 


(a 6 c 7 +a 7 c 6 ) 


(b 6 c 7 + 6 7 c 6 ) 


-(a 6 b 7 + a 7 6 6 ) 




a 6 a 7 



If the matrix A is invertible, this system admits a 

solution u = A~ l b . Finally, the intrinsic camera parameters are 
retrieved from u as follows: 



fx 

fy 



4 



u \ ~ ^3 



u 4 -u 3 u 5 
u 2 ~u\ 



(3.20) 



a — u 5 

This final step consisting of de-embedding the intrinsic 
parameters from the vector u is equivalent to a Choleski 

decomposition of the matrix in order to retrieve K . 

If A is not invertible, then the camera model needs to be 
reduced. A similar situation occurs when only one face of the 
rectangular parallelepiped is known to be square. In that case, 
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one of the last two constraints of (3.15) has to be dropped, 
leaving only 4 equations. A first model reduction consists of 
setting the skew factor a = 0 , and keeping as unknowns the two 
focals (f x9 f y ) and camera center (c X9 c y ) . That approximation is 
very reasonable for most cameras currently available. The 
resulting camera matrix K has the form: 



K 



Vf* 0 -c x /f x 

0 l/fy ~C y /f y 

0 0 1 



(3.21) 



leading to the following K K matrix: 



K K = -T 

fx 



1 0 -c x 

—C x —Cy(f x / f fy) fx + c x C yifxf fy) 



J_ 
f 2 

Jx 



1 0 -u 3 

0 u 2 —u 4 
-u 3 —u 4 Ui 



Then, each constraint (V/) T (K T K) (V/) = 0 may be written in the 



form of a linear equation in u = [u } ... u 4 ] : 



[ -c t c 3 - b t b 3 ( a % c 3 + a 3 c t ) (b,c J + b 3 c t ) 



u — a t a 3 , 



(3 .22) 



resulting in a 4 x 4 linear system Au—b , admitting the 
solution u if A is rank 4. The intrinsic camera parameters 
{fx>fyi c x> c y) ma Y then be computed from the vector u following the 
set of equations (3.20) setting a = u 5 = 0 . When A has rank less 
than 4, the camera model needs to be further reduced (that is the 
case when only 3 orthogonality constraints are available) . A 
second reduction consists of using a single focal f c ~f x =f yr 
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leading to a 3 DOF model. In that case, the K matrix takes on 
the following reduced expression: 



K = 



l// c 0 
0 l// c 
0 0 



^xl fc 

-Cy/fc 



1 0 



— u 2 



=> K T K = \ 0 1 
L 2 



(/c ^x "I" ) 



/c 2 



—u 2 —u 3 



Then 7 each constraint listed in (3.15) can be written in the form 
of a linear equation of the variable u = [u x u 2 %] T : 



(Vff (K T K) (F/) = 0 O [ -c % c 3 ( a t c 3 + a 3 c % ) (b t c 3 4- b 3 c t ) ] u = (a % a 3 + b x b 3 ) 



Once again, this leads to the linear problem An = b where A is 
a 5 x 3 matrix, and b a 5 -vector (if all five constraints are 
valid) . A least squares solution is in general possible: 

u = (A T A)~ l A T b . Notice that if the faces of the rig are known 
mutually orthogonal but not necessarily square, then only the 
three first constraints of (3.15) are enforceable. In that case, 

the linear system is 3 x 3, and its solution is u = A~ ] b . Once 
the vector u is recovered, the intrinsic camera parameters have 
the following expression: 




u. 



'2 



(3 .23) 



c — u 

y 
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When A has rank less than 3, the model needs to be further 
reduced to 2 or 1 DOF by taking the optical center out of the 
model (fixing it to the center of the image) and then going back 
to the original model adopted in the planar rig case (one or two 
focal models) . 

This system can enable decoupling intrinsic from extrinsic 
parameters and derive a set of closed-form solutions for 
intrinsic camera calibration in case of five model orders: 1, 2, 
3, 4 and 5 degrees of freedom. In addition, we stated conditions 
of observability of those models under different experimental 
situations corresponding to planar and three-dimensional rigs. 
The following table summarizes the results by giving, for each 
model order, the list of parameters we have retrieved explicit 
expressions for, as well as the minimum structural rig necessary 
to estimate the model: 
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Model order 


Parameters 


Calibration rig (minimum required 
structure) 


1 DOF 


f fx fy 


2D rectangle 


2 DOF 


fx? fy 


2D square 


3 DOF 


f fx fy? C x ^Cy 


3D rectangular parallelepiped 


4 DOF 


fx ? fy? ^ x ? ^ y 


3D rectangular parallelepiped with 
one square face 


5 DOF 


fx? fyr>^x? ^y ? ^ 


3D cube 



One could use this explicit set of solutions for calibrating 
image sequences where intrinsic camera parameters are time 
varying. A typical sequence could be a flyby movie over a city 
with buildings. 

In a broader sense, this work provides a general framework 
for approaching problems involving reconstruction of three- 
dimensional scenes with known structural constraints (for example 
orthogonality of building walls in a picture of a city) . Indeed, 
constraints, such as parallelism or orthogonality, find very 
compact and exploitable representations in dual-space. This 
approach may avoid traditional iterative minimization techniques 
(computationally expensive) in order to exploit constraints in a 
scene . 
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Although only a few embodiments have been disclosed in 
detail above, other embodiments are possible. 
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